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The statistical distances between countries, calculated for various moving average time windows, 
are mapped into the ultrametric subdominant space as in classical Minimal Spanning Tree methods. 
The Moving Average Minimal Length Path (MAMLP) algorithm allows a decoupling of fluctuations 
with respect to the mass center of the system from the movement of the mass center itself. A 
Hamiltonian representation given by a factor graph is used and plays the role of cost function. 
The present analysis pertains to 11 macroeconomic (ME) indicators, namely the GDP (a;i), Final 
Consumption Expenditure (12), Gross Capital Formation (2:3), Net Exports (2:4), Consumer Price 
Index (1/1), Rates of Interest of the Central Banks (1/2), Labour Force (21), Unemployment (22), 
GDP/hour worked (23), GDP/capita (wi) and Gini coefficient (11)2). The target group of countries 
is composed of 15 EU countries, data taken between 1995 and 2004. By two different methods (the 
Bipartite Factor Graph Analysis and the Correlation Matrix Eigensystem Analysis) it is found that 
\ the strongly correlated countries with respect to the macroeconomic indicators fluctuations can be 

O . partitioned into stable clusters. 
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.$H . T. INTRODUCTION 

>v 

Modeling the dependences between the macroeconomic (ME) variables has to take into account circumstances that 
CIh' differ substantially from those encountered in the natural sciences. First, experimentation is usually not feasible 
and is replaced by survey research, implying that the explanatory variables cannot be manipulated and fixed by the 
researcher. Second, the number of possible explanatory variables is often quite large, unlike the small number of 
, carefully chosen treatment variables frequently found in the natural sciences. Third, the ME time series are short and 
noisy. Most data have a yearly frequency. When social time series have been produced for a very long period, there 
Q\ \ is usually strong evidence against stationarity. 

Some macroeconomic (ME) indicators are monthly and/or quarterly registered, increasing in this way the number 
of available data points, but some additional noise is naturally enclosed in the time series so generated (seasonal 
fluctuations, external and internal short range shocks, etc). This seems to be a solid argument for the fact that the 
main data sources, at least the ones freely available on the web, tend only to keep the annual averages/rates of growth 
, of the ME indicators. 

Let us consider, for example, a time interval of one hundred years, which is mapped onto a graphical plot of 100 
data points. From the statistical physics viewpoint, 100 is a quite small number of data points, surely too small for 
speaking about the so called thermodynamic limit. On the other hand, from a socio-economic point of view, we can 
justifiably wonder if a growth, say, of 2% of any ME indicator has at the present time the same meaning as it had 
I- one century ago. One must take into account that during that time, the social, politic and economic environment was 
drastically changed. Moreover the methodology of data collecting and processing is today different from what it was 
two generations ago. Indeed, the economic world is created by people and is substantially changing from a generation 
to another one (sometimes also during one and the same generation). Thus, this way of statistical data aggregation 
, turns to be controversial. 

Several papers Q, 0] investigated the statistical patterns in GDP annual rates of growth by aggregating (in a 
" horizontal" way) the data from all countries for which statistical data were reported. Even if all data are supposed to 
be reliable, and even if the relative rates of growth are investigated (to diminish the actual large difference influences), 
this way of aggregation, as well as the previous one, supposes a priori a certain degree of homogeneity across countries. 
A certain GDP rate of growth in an underdeveloped country is certainly based on factors that differ substantially from 
the ones that generate the same rate of growth in a developed country. Both theoretical and empirical investigations 
0, 3 reported the evidence of the country partitioning in clusters after their common patterns of evolution. For such 
subsystems only, the data might be meaningfully aggregated. In the present paper we demonstrate the clustering 
emergence in the relatively stable and homogeneous system composed of the 15 EU countries for data taken between 
1994 and 2004, starting from the annual rates of growth of 11 ME indicators, namely the GDP (x±), Final Consumption 
Expenditure (2:2), Gross Capital Formation (#3), Net Exports (xa), Consumer Price Index (yi), Rates of Interest of 
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the Central Banks (1/2), Labour Force (zi), Unemployment (22), GDP/hour worked (23), GDP/capita (u>i) and Gini 
coefficient (w 2 )- 

One has to stress here that theproblem of studying the patterns of growth across countries is actually a subject of 
great attention to economists 0, . An important reason for the increasing interest in this problem is that persistent 
disparities in aggregate growth rates across countries have, over time, led to large differences in welfare. On the 
other hand, the intellectual payoffs are high: various statistical tools might be considerably enriched and extended by 
applying them to the non-stationary, short and noisy macroeconomic time series. 

In the present paper we focus on two recent lines of research, of growing interest in physics, which can bring 
important contributions to ME time series analysis. On one hand, the recent developments in nonequilibrium networks 
0; on the other hand, the random matrix theory (RMT), initially developed in nuclear physics, also successfully used 
in the study of canonical correlations between stock changes and portfolio optimization problem Q • The way in which 
these methods are adapted to the macroeconomic time series analysis is described in the next section. 

The Minimal Spanning Tree (MST) is one of the most usual methods in cluster analysis, and has been largely used 
so far both by physicists Q and economists 0. Nonetheless, both sides 0,0 noted some lack of univocity due to 
choosing the MST root. Moreover, the MST structure proves to be not stable when a constant size time window 
is moved over the considered time span. The solution briefly presented in Section 3, namely the Moving Average 
Minimal Length Path (MAMLP) method comes as a development of some previous methods where some arbitrariness 
in the root of the tree was underlined considering that an a priori more common root, like the sum of the data, called 
the " All" country, from which to let the tree grow was permitting a better comparison || . 

The target group of countries is composed of 15 EU countries, data taken between 1994 and 2004. The main 
sources used for all the above indicators annual rates is the World Bank database ^(j an d the OECD database 
[TT| . We abbreviate the countries according to the Roots Web Surname List (RSL) which uses 3 letters standardized 
abbreviations to designate countries and other regional locations (http://helpdesk.rootsweb.com/codes/). Inside the 
tables, for spacing reasons we use the countries two letters abbreviation (http://www.iso.org). 

The remainder of the paper is organized as follows: in Section 2 the theoretical and methodological tools from the 
network analysis and matrix theory which we try to adapt to the considered time series are briefly described. The 
results are largely presented and discussed in Section 3. Some concluding remarks are done in Section 4. 



II. THEORETICAL AND METHODOLOGICAL FRAMEWORK 



As mentioned in Sect. 1, MST cannot be built in a unique way, whence this becomes a problem when we try to 
construct a cluster hierarchy for each position of a moving time window. The hierarchical structure proved to be 
not robust against fluctuations induced by a moving time window. In the MAMLP method described here below 
we propose to construct the hierarchy starting from a virtual 'average' agent. The method is developed in the 
following steps: (i) An "AVERAGE' agent (AV) is virtually included into the system; the statistical distance matrix 
is constructed, and thereafter, the elements are set into increasing order (i.e. the decreasing order of correlations); (ii) 
The hierarchy is constructed, connecting each agent by its minimal length path (MLP) to AV. Its minimal distance to 
AV is associated to each agent, (iii) The procedure is repeated by moving a given and constant time window (in this 
case a 5 years time window size) over the investigated time span (in the present analysis: 1994-2004). The agents are 
sorted through their movement inside the hierarchy. Therefore, a new correlation matrix between country distances 
to their own mean is constructed. The matrix elements are defined as: 

^ ^ _ < & (t)dj (t)>-< dj (t) > < dj (t) > 



< (d l (t)) 2 - < di(t) > 2 >< (dj{t)) 2 - < dj(t) > 2 > 

where di(t) is the i-country minimal length path (MPL) distance to the AVERAGE. For simplicity, the explicit 
dependencies on the time window size T are not included in Eq. (1). 

Let us recall that for systems with discrete degrees of freedom, denoted by s, the statistical mechanical models 
are generally defined through the Hamiltonian H = H(s), which is typically a sum of terms, each involving a small 
number of variables. A useful representation is given by the factor graph [l2j. A factor graph is a bipartite graph 
made of variable nodes i, j, . . . one for each variable, and function nodes a, 6, . . . one for each term of the Hamiltonian. 
In the present approach the variable nodes are the macroeconomic indicators and the function nodes are the countries. 
An edge joins a variable node i and a function node a if and only if i £ a, i.e., the variable s, appears in H a - the 



TABLE I: MPL distances to AVERAGE. The moving time window size is 5 years for data taken from 1994 to 2004. 



AU BE DE DK ES FI FR UK GR IE IT LU NL PT SE 



94- 98 .67 .86 .86 .86 .40 .40 .67 .86 .40 .86 .86 .40 .40 .86 .86 

95- 99 .60 .65 .52 .71 .21 .77 .45 .77 .37 .65 .90 .37 .23 .83 .52 

96- 00 .58 .32 .46 .61 .34 .81 .46 .32 .32 .53 .32 .20 .60 .60 .46 

97- 01 .48 .30 .48 .30 .28 .42 .48 .44 .68 .38 .68 .14 .28 .28 .48 

98- 02 .43 .26 .19 .19 .21 .43 .19 .19 1.04 .29 .44 .12 .21 .21 .29 

99- 03 .25 .23 .19 .19 .29 .26 .19 .37 1.15 .26 .37 .23 .19 .19 .28 
00-04 .27 .27 .17 .26 .28 .27 .21 .27 .53 .50 .28 .27 .21 .21 .27 



term of the Hamiltonian associated to a. The Hamiltonian can then be written as: 



H = y^g a (s a ), with s a = {si,i e a} (2) 

a 

In combinatorial optimization problems |12| , the Hamiltonian plays the role of a cost function. In the low tempera- 
ture limit T — > oo, one is interested by only minimal energy states (ground states) having a non- vanishing probability. 

Usually, a cluster k is defined as a subset of the factor graph such that if a function node belongs to k, then all the 
variable nodes i G a also belong to k (while the converse needs not to be true, otherwise the only legitimate clusters 
would be the connected components of the factor graph). Here, this condition will be relaxed by partitioning the 
function nodes after the criterion if it is connected or not to a certain variable node. 

Once the correlation matrix is constructed, it is natural to ask for the interpretation of its eigenvalues and eigen- 
vectors. Note that since the matrix is symmetric, the eigenvalues are all real numbers. We will call v a the normalized 
eigenvector corresponding to eigenvalue A a , with a = 1, 2, ... , M.. The vector v a is the list of the weights v a ,i in this 
linear combination of the different countries. The variance corresponding to such a combination is thus: 



f M 



M 



(3) 



Furthermore, using the fact that different eigenvectors are orthogonal, we obtain a set of uncorrelated random 
fluctuations e , which are the elements of the system constructed from the weights v a ^: 



e n = ^v a ^di, where (e a e b ) = X a S a ,b (4) 



i=l 

Conversely, one can think of the initial distances as a linear combination of the uncorrelated factors E a : 



dj = 



M 



(5) 



In this decomposition, usually called "the principal component analysis", the correlated fluctuations of a set of 
random variables are decomposed in terms of the fluctuations of underlying uncorrelated factors. In the case of the 
country clustering, the principal components E a could have an economic interpretation in terms of the macroeconomic 
indicators. 
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TABLE II: The correlation matrix of EU-15 country movements inside the hierarchy. Indicator: GDP. The moving time window 
size is 5 years for data taken from 1994 to 2004. 
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Since, as generally accepted fA Il3|. the largest eigenvectors are the ones carrying the useful information, one can 
try to define clusters on the basis of the structure of these eigenvectors. Often (but not always), the largest one, 
vi, has comparable and of the same sign components on all countries, and defines the largest cluster, containing all 
countries. The second one, V2, which by construction has to be orthogonal to Vi, may have some of its components 
positive, and the others negative. This means that a probable move of the countries around the average (global) 
fluctuations occurs when some countries over-perform the average, and others under-perform it. Therefore, the sign 
of the components of V2 can be used to group the countries in two families. Each family can then be divided further, 
using the relative signs of v 3 , V4, etc. 

III. RESULTS 
A. The statistics of the correlation coefficients 

In order to exemplify the MAMPL method, the corresponding steps for xl = GDP are explicitly shown below. 
Firstly, the virtual 'AVERAGE' country is introduced in the system. The statistical distances corresponding to the 
fixed 5 years moving time window are set in increasing order and the minimal length path (MPL) connections to the 
AVERAGE are established for each country in every time interval (Table I) . 

The resulting hierarchy is found to be changing from a time interval to another. Therefore, corresponding correlation 
matrix is built, this time for the country movements inside the hierarchy (Table II). The above procedure is repeated 
for each macroeconomic indicator. Thus, the MAMPL method leads us to a set of M = 11 correlation matrices, 
having size N x N, where N = 15 is the number of countries under consideration. 

Firstly, we analyse the whole set of correlation coefficients. A correlation coefficient Cij will be taken into account 
as representing a strong connection if and only if \Cij\ > Cthr, where Cthr is a certain a priori chosen threshold 
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Cumulative distribution of C| * Relative nr of links for the threshold |C 



FIG. 1: The cumulative distribution of the correlation coefficients and the relative number of connections versus the \d 



(respectively C, 



thr 



\c\ 



\c\ 



value. For small values of the Cthr, all 15 countries have at least one strong connection, i.e. the graph is fully 
connected. Increasing the Cthr, the number of the connections decreases. In Fig. 1 the relative number of links (the 
ratio between the number of actual links and the number of all possible links) is plotted versus the threshold value. 
One can observe that the data is well fitted by a low order polynomial. In Fig. 1 the cumulative distribution of 
the correlation coefficients is also plotted (now, the values are the cumulative frequencies and the abscissas are the 
corresponding correlation coefficients). For comparison, the cumulative uniform distribution is also plotted. The high 
value of the square of the Pearson product moment correlation coefficient, R 2 > 0.99, indicates a good fit of both 
distributions. 

Nevertheless, performing the \ 2 test over the whole set of correlation coefficients we must reject the null hypothesis 
of the fitting |C| distribution by the uniform in the confidence interval of 99 %. Investigating by sight the data set 
one remarks an anomalous large number of correlation coefficients (N20 — 100) in the range 0.95-1.00, while the 
mean of the distribution is 57.75 and the standard deviation is a — 7 .45. According to Chebyshev's theorem [l4| . 
an interval of ±4 standard deviations ensures that at least 94 % of the data (of an arbitrary distribution) falls inside 
this interval. Thus, the last point of the distribution can be treated as an outlier, and, performing the x 2 test for 
the remainder points we can accept the hypothesis of the same distribution in a confidence interval of over 75 %. We 
must note here that the same conclusion is supported by t-Student's test in a confidence interval of 100 %, the two 
distributions having exactly the same mean. Joining together the results of the statistical tests, we can conclude that 
the correlation coefficients distribution is a uniform distribution. 



B. The bipartite factor graph analysis 



As it has been already shown, the factor graph structure is strongly dependent on the threshold value Cthr- In order 
to establish the most appropriate Cthr, a two tailed t-test of statistical significance is performed over the correlation 
matrix elements |l lj . The null hypothesis (a correlation coefficient of zero) assumes that there is no linear relationship 
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FIG. 2: The eigenvalue spectrum of the correlation matrices between EU-15 country movements with respect to AVERAGE, 
for each ME indicator (inset) . RM: the eigenvalue spectrum of the random matrix. 



between the two variable sets. In order to test the significance of the correlation coefficients we use the test statistic: 



1 



(6) 



where r = Cij and n — 2 is the number of degrees of freedom. The correlation coefficient is considered to be 
statistically significant if the computed t value is greater than the critical value tc of a t-Student's distribution with 
a level of significance of a. From Eq. (6) one derives: 



Taking n = 7 (the number of statistical distances used for computing each correlation coefficient, from the t-Studcnt 
distribution tables we find the critical value tc — 3.365 for a reasonable level of significance a = 0.02 (or, equivalcntly, 
98 % confidence interval). From Eq. (7) we get rc = Cthr — 0.83 i.e. the null hypothesis can only be rejected for the 
correlation coefficients greater or at least equal to this value. The significant correlation coefficients are emphasized 
in bold in Table II. 
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TABLE III: The first eigenvector components 

GDP CONS CAPF NEXP CPI INTR LABF UNEMP GDPH GDPC GINI 

AU -0.276 -0.300 0.373 -0.328 -0.109 -0.274 0.239 0.305 -0.294 -0.289 -0.261 

BE -0.287 -0.325 0.357 0.189 0.003 -0.271 0.308 0.229 -0.351 -0.259 -0.371 

DE -0.296 -0.304 0.257 -0.371 -0.334 -0.274 -0.343 0.299 -0.284 -0.261 -0.122 

DK -0.303 -0.097 0.281 0.111 -0.003 -0.276 -0.293 -0.250 -0.161 -0.287 -0.131 

ES -0.167 -0.325 0.356 -0.171 -0.260 -0.276 0.331 -0.271 0.244 -0.275 0.360 

FI -0.155 -0.159 0.277 0.077 0.342 -0.268 -0.199 -0.322 -0.343 -0.213 -0.047 

FR -0.288 -0.188 0.356 0.282 0.368 -0.272 0.100 0.372 -0.320 -0.229 0.317 

UK -0.274 -0.321 0.088 0.244 0.003 -0.234 0.328 -0.322 -0.352 -0.250 -0.310 

GR -0.239 -0.103 0.132 0.048 -0.266 -0.189 0.152 0.230 0.130 0.257 0.360 

IE -0.290 -0.325 0.274 0.351 0.300 -0.276 -0.163 -0.322 0.068 -0.282 0.188 

IT -0.236 0.001 -0.053 -0.354 -0.363 -0.276 -0.308 0.105 0.045 -0.222 0.216 

LU -0.231 0.026 -0.140 0.077 -0.266 -0.201 0.299 -0.140 -0.210 -0.251 -0.107 

NL -0.165 -0.325 0.059 0.056 0.110 -0.274 0.151 -0.194 -0.207 -0.272 -0.345 

PT -0.297 -0.325 -0.030 -0.387 -0.341 -0.276 -0.277 -0.029 -0.320 -0.254 0.262 

SE -0.293 -0.325 0.361 0.351 -0.254 -0.208 0.209 0.239 0.258 -0.257 -0.154 



It is interesting to remark that the two plots from Fig. 1 do intersect at the abscissa 0.83 which is equal to the rc 
above found. The intersection point seems to correspond to an optimal choosing of Cthr, under the constrain of the 
competition between link removing and the remainder correlations to be taken into account. 

One can easily see that not all 15 countries (function nodes) are connected through the variable node x\ (GDP 
fluctuations), but only 11 of them. Their contributions to the Hamiltonian include the variable x\. 

The above procedure is repeated for each ME variable and leads us to the Hamiltonian (or cost func- 
tion) having the form: H = AUT{x\, x 2 , x 3 , x A , y 2 , Zi, z 2 , z 3 , wi, w 2 ) + BEL{x\, x 2 , x 3 , y\, y 2 , Zi, z 3 , wi, w 2 ) + 
DEU(x 1 ,x 2 ,x 4: ,yi,y 2 ,zi,z 2 ,z 3 ,wi,w 2 ) + DNK(xi, x 3 , x 4 , y 2 , z\, z 2 , W%, w 2 ) + ESP(x 2 , x 3 , y 2 , z\, z 2 , Wi, w 2 ) + 
FIN(x 3 ,X4,y 1 ,y 2 ,z 2 ,z 3 ,wi,w 2 )+FRA(xi,x 3 ,X4,yi,y 2 ,z 2 ,z 3 ,wi,w 2 )+GBR(xi,x 2 ,x 3 ,x 4 ,yi,y 2 ,zi,z 2 ,z 3 ,wi,w 2 ) + 
GRC(x4,,yi,z 2 ,wi,w 2 ) + IRL(xi,x 2 ,x 3 ,X4,yi,y 2 ,z 2 ,wi,w 2 ) + ITA(xi, x 4 , y 1} y 2 , zi, z 2 , Wi, w 2 ) + 
LUX(xi,X4,yi, 2/2, zi, z 2 ,z 3 ,wi,w 2 ) + NLD(x 2 ,X4,y 2 ,z 2 ,wi,w 2 ) + PRT(xi, x 2 , x 3 , x±, y%, y 2 , z%, z 2 , z 3 , wi, w 2 ) + 
SWE(xi,x 2 , x 3 , Xi, 2/i, 2/ 2 , z 2 , wi,w 2 ). 



C. The correlation matrix analysis 



From the result of the bipartite graph analysis, some countries binary partition in respect to each ME variable can 
be already seen: a country is connected or not to the respective variable node. Nonetheless, a complete solution to 
this problem can only be obtained by analyzing the correlation matrix eigensystems. A parallel to similar results from 
the stock market investigation 0, can be also drawn. 

The eigenvalue spectrum for the empirical correlation matrices is plotted in Fig. 2 for all the ME variables. The 
results are compared with those of a random uncorrelated matrix (RM), having the same size (15 x 15), constructed 
by generating random numbers. 

In stock market analysis the largest eigenvalue, often called "market effect", is supposed to describe the collective 
movement of stock prices, because the corresponding eigenvector components have the same sign and approximately 
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TABLE IV: The second eigenvector components 

GDP CONS CAPF NEXP CPI INTR LABF UNEMP GDPH GDPC GINI 

AU 0.014 -0.155 0.043 -0.030 -0.285 -0.079 0.393 0.268 -0.204 -0.078 0.121 

BE -0.236 -0.042 -0.124 0.279 -0.179 -0.074 -0.026 -0.060 -0.086 0.224 0.051 

DE 0.013 -0.141 0.204 -0.110 -0.162 -0.046 0.009 0.273 0.174 0.295 0.339 

DK 0.052 0.335 -0.315 -0.433 0.387 0.003 -0.238 0.335 0.276 -0.099 -0.397 

ES 0.247 -0.033 0.146 -0.094 -0.234 -0.032 -0.040 -0.197 -0.192 -0.232 -0.083 

FI 0.404 0.427 -0.306 -0.423 -0.164 -0.114 0.359 -0.054 0.006 -0.424 -0.385 

FR 0.079 0.142 0.146 0.012 -0.149 -0.086 -0.256 -0.012 0.194 0.268 0.190 

UK -0.309 0.039 -0.420 -0.191 0.085 0.314 -0.110 -0.061 -0.011 0.092 0.103 

GR 0.238 0.332 0.266 -0.356 0.241 -0.605 -0.399 -0.358 0.340 0.283 -0.083 

IE -0.055 -0.042 -0.075 0.156 -0.343 -0.020 -0.385 -0.196 0.429 -0.108 0.295 

IT -0.323 -0.456 -0.417 0.040 0.051 -0.032 -0.172 0.000 0.306 0.402 0.340 

LU -0.306 0.560 -0.090 -0.423 -0.309 0.471 -0.113 0.424 0.392 0.199 0.300 

NL 0.576 -0.033 -0.264 -0.372 -0.448 -0.079 -0.355 0.381 -0.352 -0.186 0.109 

PT 0.007 -0.033 -0.438 0.052 -0.094 -0.032 0.129 0.443 0.126 -0.323 -0.241 

SE -0.062 -0.033 0.094 0.156 -0.342 0.519 0.296 0.061 0.286 0.318 -0.372 



the same size. Looking at the first and second eigenvector components (Tables III and IV) one can easily see that, 
for the ME correlation matrices, the above interpretation is only partially valid, for x\ = GDP, X2 = Consumption, 
X3 = Capital Formation, w\ = GDP/capita and y2 = Interest Rates. The fluctuations of these indicators seem to 
reflect a global similarity, as a result of the so-called "globalization trend" . The same result was also found in for 
the first four indicators, by another method, namely measuring the mean statistical distances between countries. The 
fifth indicator analyzed in [TH was the Net Exports, for which no occurrence of this effect was reported - in perfect 
agreement with the actual results. 



D. Clustering method and results 

The clustering scheme can be next elaborated as follows: firstly, the so-called first order clusters are selected using 
the bipartite factor graph, i.e. meaning the clusters of countries having at least one connection to the respective 
variable node. The countries are further partitioned after the sign and the magnitude of eigenvector components, 
using Table IV (for x\, X2, £3, y2 and wi) and Table III (for the others). For several indicators (xi,X2 and Z3) we 
also selected some groups that can be called second-order clusters, including some countries which are not tied in the 
factor graph, but have important contributions to the eigenvector structure i.e. large size components. These clusters 
are written into parentheses in Table V. 

Looking at the development indicators (cci , X2, #3, £4 and u>i), we find approximately the same clustering scheme 
as reported in [T^ but more extended. There is some agreement with the results reported by Chen in p| regarding 
the co-movement between real activity and prices during the period 1992-1997 i.e. the partition of FRA-DEU and 
ITA into different clusters with respect to the Consumer Price Index fluctuations. Moreover there is agreement with 
the MST constructed in for 1996 i.e. the strong connections BEL-DEU-FRA-LUX, IRE-FIN and ESP-PRT with 
respect to the GDP/capita. 
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TABLE V: The EU-15 clustering. The second column displays the eigenvector whose components are used for building the 
classification scheme. The groups into parentheses are the second-order clusters 



INDICATOR 


EVC 


CLUSTERS 


GDP 


v 2 


BEL-GBR-ITA-LUX 
AUT-DEU-DNK-FRA-PRT 
(ESP-FIN-NLD) 


Final Consumption 
Expenditure 


V 2 


AUT-DEU 
(DNK-FIN-FRA-GRC-LUX) 


Gross Capital 
Formation 


V 2 


BEL-DNK-FIN-GBR-PRT 
ESP-FRA 


Net Exports 


Vl 


AUT-DEU-ITA-PRT 
DNK-FRA-GBR-IRL-SWE 


Consumer Price 
Index 


Vl 


DEU-ITA-GRC-LUX 
FIN-FRA-IRL 


Rate of Interest 


V2 


GBR-LUX-SWE 
All the others, except for GRC 


Labour Force 


Vl 


AUT-BEL-ESP-GBR-LUX 
DEU-DNK-ITA-PRT 


Unemployment 


Vl 


AUT-DEU-FRA-GRC-ITA-SWE 
DNK-ESP-FIN-GBR-IRL-LUX-NLD 


GDP per hour 
worked 


Vl 


DEU-FRA-LUX-PRT 
(ESP-GRC-SWE) 


GDP per capita 


V2 


BEL-DEU-FRA-GRC-ITA-LUX-SWE 
ESP-FIN-IRL-NLD-PRT 


Gini coefficient 


Vl 


AUT-BEL-DEU-DNK-GBR-LUX-NLD-SWE 
ESP-FRA-GRC-IRL-ITA-PRT 



IV. CONCLUDING REMARKS 

Here above we have shown that short and noisy macroeconomic time series can be efficiently investigated by moving 
a constant size time window with a constant step over the time span of interest. The statistical distances between 
countries, which are calculated using the linear correlations between the datasets for each time interval, can be used for 
computing the ultrametrical distance from each country to a virtual introduced one, called "Average". This method, 
called Moving- Average-Minimal-Length-Path, results in a new set of correlation matrices between country distances 
to their own mean. The new correlation coefficients describe as well as possible the cross-country similarities between 
the macroeconomic indicator fluctuations around the average common trend. 

The distribution of the absolute values of the correlation coefficients is the uniform distribution. This can be an 
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effect due to the refative small number of data used for computing them (see Table I), but can be also seen as reflecting 
the diversity resulted from the large number of particular factors underling the time evolution of each ME indicator. As 
well as in the biological systems, the existence of some common patterns does not exclude the idiosyncratic diversity. 

The Bipartite Factor Graph connects in the simplest possible way all the countries by means of corresponding 
variable nodes assimilated here to the ME indicators. In spite of its simplicity, the method requires an appropriate 
choosing of the threshold value for the correlation coefficients. One way of evaluating the threshold value can be the 
t-Student's test of statistical significance, as it has been done in the previous section. We have found the threshold 
value near 0.83, in a confidence interval of 98 % of the correlation coefficients statistical significance. 

The Bipartite Factor Graph leads to a clustering scheme in which all the countries are involved (a country can 
only be tied or not tied to the respective variable). For a reliable clustering scheme, more investigation is required, 
particularly concerning the tied countries. This investigation was performed in the previous section by analyzing the 
correlation matrix eigensystems. 

As compared with the similar investigation of stock prices clustering, there are some similarities, but also important 
differences. The Random Matrix Theory could only be partially used here, except for those results valid in the limit 
of infinite matrices: the finite size effects are much stronger here than in the stock market they are. For finding the 
so-called noise band 0, we had to construct the N x N (N = 15) random matrix having all its rows and columns 
uncorrelated. Its eigenvalue spectrum was plotted in Fig. 2. 

The first two eigenvalues (the largest) are far outside the noise band, thus the so called chance or noise correlation 
hypothesis can be rejected. Unlike the result obtained for stocks, here the largest eigenvalues does not reflect always 
a collective mode of the system. The few indicators for which this propriety holds, are the ones more sensitive to the 
globalization phenomena. 

Finally, as regards the clustering structure, some overlapping with similar results reported in the economic literature 
was found. However, the clusters composition is most likely a variable from a time span to another. What is important 
is the existence of the clusters themselves, as this hierarchical structure emerged in a period in which the globalization 
tendencies were strong and the European common policy was generally oriented to extension and cohesion. In spite 
of all convergent economic policies, the emergence of the clustering structure seems to be inherent to EU-15 system, 
as well as it is inherent, perhaps, to any human community. 
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